A Methodology for Exploring Communication Architectures of Clustered Vliw Processors
نویسندگان
چکیده
VLIW processors have started gaining acceptance in the embedded systems domain. However, monolithic register file VLIW processors with a large number of functional units are not viable. This is because of the need for a large number of ports to support FU requirements, which makes them expensive and extremely slow. A simple solution is to break up this register file into a number of small register files with a subset of FUs connected to it. These architectures are termed as clustered VLIW processors. This thesis focuses on customizing inter-cluster inter-connection networks (ICN) in high issue-rate clustered VLIW processors. While a wide variety of inter-cluster ICNs are reported in literature what is missing is a quantitative evaluation of this design-space. Researchers have used specific tools and methodologies for architecting such VLIW processors, wherein some of the other ICNs are qualitatively eliminated. We build a basis for exploring high issue-rate processor by showing that on an average the media applications [could have an ILP of 20 or even higher]. Towards this end we classify the previous reported results on ILP measurement, coining a novel measurement technique, Achievable-H ILP, which is useful for predicting future architecture requirements. We present a methodology along with the supporting tool chain for exploring the design-space of inter-cluster ICNs. We also classify the previously used architectures and demonstrate that a vast part of this design-space is currently [unexplored]. We conclusively establish that most of the bus-based RF-toRF style ICNs are heavily performance constrained. Finally to prove the superiority of point-to-point type ICNs we develop a parameterized clustered VLIW generator. Using the generated architectures as input to industry standard synthesis, place and route tools we present results on the implementation characterstics of the various ICNs.
منابع مشابه
Exploring Energy-Performance Trade-Offs for Heterogeneous Interconnect Clustered VLIW Processors
Clustered architecture processors are preferred for embedded systems because centralized register file architectures scale poorly in terms of clock rate, chip area, and power consumption. Although clustering helps by improving clock speed, reducing energy consumption of the logic, and making design simpler, it introduces extra overheads by way of inter-cluster communication. This communication ...
متن کاملPragmatic integrated scheduling for clustered VLIW architectures
Clustered architecture processors are preferred for embedded systems because centralized register file architectures scale poorly in terms of clock rate, chip area, and power consumption. Scheduling for clustered architectures involves spatial concerns (where to schedule) as well as temporal concerns (when to schedule). Various clustered VLIW configurations, connectivity types, and inter-cluste...
متن کاملStream Execution on Embedded Wide-Issue Clustered VLIW Architectures
Very long instruction word(VLIW-) based processors have become widely adopted as a basic building block in modern Systemon-Chip designs. Advances in clustered VLIW architectures have extended the scalability of the VLIW architecture paradigm to a large number of functional units and very-wide-issue widths. A central challenge with wide-issue clustered VLIW architecture is the availability of pr...
متن کاملThesis - Vasileios Porpodas
Very Long Instruction Word (VLIW) processors are wide-issue statically scheduled processors. Instruction scheduling for these processors is performed by the compiler and is therefore a critical factor for its operation. Some VLIWs are clustered, a design that improves scalability to higher issue widths while improving energy efficiency and frequency. Their design is based on physically partitio...
متن کاملCompiler-assisted power optimization for clustered VLIW architectures
Clustered VLIW architectures solve the scalability problem associated with flat VLIW architectures by partitioning the register file and connecting only a subset of the functional units to a register file. However, inter-cluster communication in clustered architectures leads to increased leakage in functional components and a high number of register accesses. In this paper, we propose compiler ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005